Next: Paired-Sample Statistics, Previous: Statistical Operations, Up: Statistical Operations [Contents][Index]
These functions do various statistical computations on single vectors. Given a numeric prefix argument, they actually pop n objects from the stack and combine them into a data vector. Each object may be either a number or a vector; if a vector, any sub-vectors inside it are “flattened” as if by v a 0; see Manipulating Vectors. By default one object is popped, which (in order to be useful) is usually a vector.
If an argument is a variable name, and the value stored in that variable is a vector, then the stored vector is used. This method has the advantage that if your data vector is large, you can avoid the slow process of manipulating it directly on the stack.
These functions are left in symbolic form if any of their arguments are not numbers or vectors, e.g., if an argument is a formula, or a non-vector variable. However, formulas embedded within vector arguments are accepted; the result is a symbolic representation of the computation, based on the assumption that the formula does not itself represent a vector. All varieties of numbers such as error forms and interval forms are acceptable.
Some of the functions in this section also accept a single error form or interval as an argument. They then describe a property of the normal or uniform (respectively) statistical distribution described by the argument. The arguments are interpreted in the same way as the M argument of the random number function k r. In particular, an interval with integer limits is considered an integer distribution, so that ‘[2 .. 6)’ is the same as ‘[2 .. 5]’. An interval with at least one floating-point limit is a continuous distribution: ‘[2.0 .. 6.0)’ is not the same as ‘[2.0 .. 5.0]’!
The u # (calc-vector-count)
[vcount] command computes the number of data values
represented by the inputs. For example, ‘vcount(1,
[2, 3], [[4, 5], [], x, y])’ returns 7. If the
argument is a single vector with no sub-vectors, this simply
computes the length of the vector.
The u + (calc-vector-sum)
[vsum] command computes the sum of the data values.
The u * (calc-vector-prod)
[vprod] command computes the product of the data
values. If the input is a single flat vector, these are the same
as V R + and V R * (see Reducing and
Mapping).
The u X (calc-vector-max)
[vmax] command computes the maximum of the data
values, and the u N (calc-vector-min)
[vmin] command computes the minimum. If the argument
is an interval, this finds the minimum or maximum value in the
interval. (Note that ‘vmax([2..6)) = 5’
as described above.) If the argument is an error form, this
returns plus or minus infinity.
The u M (calc-vector-mean)
[vmean] command computes the average (arithmetic
mean) of the data values. If the inputs are error forms
‘x +/- s’, this is the weighted mean of
the ‘x’ values with weights
‘1 / s^2’. If the inputs are not error
forms, this is simply the sum of the values divided by the count
of the values.
Note that a plain number can be considered an error form with error ‘s = 0’. If the input to u M is a mixture of plain numbers and error forms, the result is the mean of the plain numbers, ignoring all values with non-zero errors. (By the above definitions it’s clear that a plain number effectively has an infinite weight, next to which an error form with a finite weight is completely negligible.)
This function also works for distributions (error forms or intervals). The mean of an error form ‘a +/- b’ is simply ‘a’. The mean of an interval is the mean of the minimum and maximum values of the interval.
The I u M (calc-vector-mean-error)
[vmeane] command computes the mean of the data
points expressed as an error form. This includes the estimated
error associated with the mean. If the inputs are error forms,
the error is the square root of the reciprocal of the sum of the
reciprocals of the squares of the input errors. (I.e., the
variance is the reciprocal of the sum of the reciprocals of the
variances.) If the inputs are plain numbers, the error is equal
to the standard deviation of the values divided by the square
root of the number of values. (This works out to be equivalent to
calculating the standard deviation and then assuming each
value’s error is equal to this standard
deviation.)
The H u M (calc-vector-median)
[vmedian] command computes the median of the data
values. The values are first sorted into numerical order; the
median is the middle value after sorting. (If the number of data
values is even, the median is taken to be the average of the two
middle values.) The median function is different from the other
functions in this section in that the arguments must all be real
numbers; variables are not accepted even when nested inside
vectors. (Otherwise it is not possible to sort the data values.)
If any of the input values are error forms, their error parts are
ignored.
The median function also accepts distributions. For both normal (error form) and uniform (interval) distributions, the median is the same as the mean.
The H I u M
(calc-vector-harmonic-mean) [vhmean]
command computes the harmonic mean of the data values. This is
defined as the reciprocal of the arithmetic mean of the
reciprocals of the values.
The u G (calc-vector-geometric-mean)
[vgmean] command computes the geometric mean of the
data values. This is the nth root of the product of
the values. This is also equal to the exp of the
arithmetic mean of the logarithms of the data values.
The H u G [agmean] command computes
the “arithmetic-geometric mean” of two numbers taken
from the stack. This is computed by replacing the two numbers
with their arithmetic mean and geometric mean, then repeating
until the two values converge.
The u R (calc-vector-rms)
[rms] command computes the RMS (root-mean-square) of
the data values. As its name suggests, this is the square root of
the mean of the squares of the data values.
The u S (calc-vector-sdev)
[vsdev] command computes the standard deviation of
the data values. If the values are error forms, the errors are
used as weights just as for u M. This is the
sample standard deviation, whose value is the square
root of the sum of the squares of the differences between the
values and the mean of the ‘N’ values,
divided by ‘N-1’.
This function also applies to distributions. The standard deviation of a single error form is simply the error part. The standard deviation of a continuous interval happens to equal the difference between the limits, divided by ‘sqrt(12)’. The standard deviation of an integer interval is the same as the standard deviation of a vector of those integers.
The I u S (calc-vector-pop-sdev)
[vpsdev] command computes the population
standard deviation. It is defined by the same formula as above
but dividing by ‘N’ instead of by
‘N-1’. The population standard deviation
is used when the input represents the entire set of data values
in the distribution; the sample standard deviation is used when
the input represents a sample of the set of all data values, so
that the mean computed from the input is itself only an estimate
of the true mean.
For error forms and continuous intervals, vpsdev
works exactly like vsdev. For integer intervals, it
computes the population standard deviation of the equivalent
vector of integers.
The H u S (calc-vector-variance)
[vvar] and H I u S
(calc-vector-pop-variance) [vpvar]
commands compute the variance of the data values. The variance is
the square of the standard deviation, i.e., the sum of the
squares of the deviations of the data values from the mean. (This
definition also applies when the argument is a
distribution.)
The vflat algebraic function returns a vector of
its arguments, interpreted in the same way as the other functions
in this section. For example, ‘vflat(1, [2, [3, 4]],
5)’ returns ‘[1, 2, 3, 4,
5]’.
Next: Paired-Sample Statistics, Previous: Statistical Operations, Up: Statistical Operations [Contents][Index]